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Abstract 

The stability method is very useful for obtaining exact solutions of many ex- 
tremal graph problems. Its key step is to establish the stability property which, 
roughly speaking, states that any two almost optimal graphs of the same order n 
can be made isomorphic by changing o(n^) edges. 

Here we show how the recently developed theory of graph limits can be used to 
give an analytic approach to stability. As an application, we present a new proof 
of the Erdos-Simonovits Stability Theorem. 

Also, we investigate various properties of the edit distance. In particular, we 
show that the combinatorial and fractional versions are within a constant factor 
from each other, thus answering a question of Goldreich, Krivelevich, Newman, and 
Rozenberg. 



1 Introduction 

The notion of the left convergence of graph sequences was introduced by Borgs, Chayes, 
Lovasz, Sos, and Vesztergombi (2003, unpublished) and was developed in |i| [Gf [Tf |8| |9| [T2| 
[161 |23l |28l [291 [31] cind other papers. Benjamini and Schramm [1] introduced convergence 
for graphs of bounded maximum degree. Tardos [M] defined limits of trees. Lovasz [27] 
presents a nice survey of this area. 

*Partially supported by the National Science Foundation, Grant DMS-0758057, and the Berkman 
Faculty Development Fund, CMU. 
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It is possible that graph hmits will become a very powerful tool, especially in extremal 
graph theory. The left limits are closely related to the (Weak) Regularity Lemma, see 
Lovasz and Szegedy [23], which is a very important and useful result. The algebraic 
characterization of Lovasz and Szegedy [28, Theorem 2.2] of possible limiting subgraph 
densities seems to have a great potential. Although these developments are very recent, 
Razborov [35| [36] has already used graph limits to obtain a spectacular progress on 
the long-standing Rademacher-Turan problem. Also, graph limits have proved helpful 
for property and parameter testing, see Benjamini, Schramm, and Shapira [2], Borgs et 
al [5], Elek [11], Lovasz and Szegedy pO], and other. 

Here is an example of how graph limits may be applied to extremal graph problems. 

Suppose that the convergence on graphs is encoded by a compact metric space {X, 6) 
and a map that corresponds to each graph G a point A{G) of X and respects graph 
isomorphism (that is, A{G) = A{H) whenever G = H). Then we say that a sequence of 
graphs (Gn)neN converges if the sequence {A{Gn))neN is Cauchy in the metric 6. In this 
case, the limit of (G'„)„gN is the (unique) limiting point of the sequence {A{Gn))ne'N in 
{X,6), which exists since is compact. 

Suppose that we are given a graph parameter /, that is, a function on graphs that 
respects graph isomorphism, and a graph property V, that is, a family of graphs closed 
under isomorphism. Let Vn = {G G V : v{G) = n} consist of all graphs in V with n 
vertices. The corresponding extremal {f,V) -problem is to determine for each n 

eXf{n,P) = max{f{G):G eVn}, 
SXf{n,V) = {GeVn-. f{G) = exf{n,V)}, 

the maximum of f{G) over all graphs from Vn as well as the set of extremal graphs, i.e. 
graphs that achieve this maximum. For example, if we let h{G) be the maximum size of a 
homogeneous set (a clique or an independent set) in a graph G, f{G) = —h{G)/ log2 v{G) 
be its scaled version, and V be the family of all graphs, then we obtain the inverse problem 
for the diagonal Ramsey numbers. Many extremal graph problems can be represented 
this way. 

Let us try to formulate some approximation (the "limiting" case) of the problem as 
n — )■ oo. We suggest the following definition. Let the limit set LIM(/, "P) consist of those 
X E X for which there is an infinite increasing sequence of indices ni < n2 < < . . . 
and graphs G Vm such that 

lim(/(G'„J-ex^(n„P))=0 (1) 

J— ^oo 

and the sequence (G„-)jgN converges to x, that is, 

lim 5(A(G„J,x) =0. 
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Although we are ultimately interested in £Xf{n,V), we do not require that G 
SX f{ni,V) here. One of the reasons is that we often know exf{n,V) asymptotically but 
not exactly, in which case one can test if ([T]) holds but not the membership in f{n, V). 

Now, we can try to study the set LlM{f,V), which is independent of n. If we suc- 
ceed in completely describing it, then we might be able to discover some information 
about extremal graphs. Indeed, if we select arbitrary extremal graphs Gn G £Xf{n,J^) 
for infinitely many n, then, by the compactness of {X,6), there always is a convergent 
subsequence, whose hmit belongs to LIM(/, P). Suppose that this convergence implies 
some structural statement (in purely graph theoretical terms) that necessarily occurs 
for infinitely many of the selected extremal graphs. Then one can conclude that the 
statement fails only for finitely many extremal graphs overall. 

One can call this approach the limit method. It applies in principle to very general 
settings. For example, the families P„ need not be related to each other for different 
n nor the graph parameter / has to behave well with respects to taking limits: the 
above definitions make perfect sense for arbitrary / and V (and LlM{f,V) ^ provided 
infinitely many of VnS are non-empty). Also, the definition of the limit set may be 
modified to work with other extremal problems, those which are indexed by a different 
parameter than the order of a graph. 

Since the limit method deals only with some approximation of the extremal problem, 
one would hope to obtain only the asymptotic of exf{n,V) at best. However, this ap- 
proach might work well together with the so-called stability method that has proved very 
useful in solving many extremal problems exactly (including the description of SX f{n, V)) 
for all large n. 

The stability method proceeds as follows. Suppose that we know the value of exf{n, V) 
asymptotically and that we have some set C„ beheved to be exactly the set £Xf{n, V) for 
all large n. Assume that Cn C Vn and / is constant on C„. (Of course, these assumptions 
are necessary for C„ = SXf{n,V) and, usually, they are easy to check.) Given C„, we 
have to prove first that for any almost extremal graph G & Vn (i-e. G & Vn satisfying 
f{G) = exf{n,V) — o(l)) there is H E Cn such that 6i{G,H) = o(l), where 

6i{G,H) = 4t minUEiG) Aa(E(H))\ : bijective a : V(H) V(G)} (2) 

is the edit distance between two graphs of the same order n: it is 2/?t,^ times the minimum 
number of adjacencies that one has to change in G to make it isomorphic to H. Next, 
pick an arbitrary G G f{n, V) for a sufficiently large n. By the above, we know that G 
is close in the distance Si to the graph property C„. In order to complete the proof, it is 
enough to argue that G is necessarily in C„. Here we can use various arguments, such as 
applying "local improvements" to G or arguing that every "wrong" adjacency in G bears 
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too much penalty. Knowing all but o(n^) edges of G greatly helps in this task; this is 
what makes this method so successful. This approach was pioneered by Simonovits [37] 
in the late 1960s. It has been used to obtain exact solutions for an impressive array of 
problems since then. 

The term "stability" refers to the property that every almost extremal graph has 
structure almost the same as some extremal graph. A class of extremal problems for 
which this method seems to be particularly suited is when there is only one pattern 
independent of n for all almost extremal graphs. In order to state this property formally, 
we have to define a version of edit distance for arbitrary pairs of graphs. Namely, the 5i- 
distance, denoted by 6i{G, H), between graphs G and H on vertex sets {xi, . . . , Xm} and 
{yi, . . . ,yn} respectively is the minimum over all non- negative m x ra- matrices A = (aij) 
with row sums 1/m and column sums 1/n of 

6i{G,H,A)= ^ (3) 

{i,j,g,h)€A 

where A consists of all quadruples {i,j,g,h) G [m]^ x [n]"^ such that exactly one of the 
following two relations holds: either {xi,Xj} G E{G) or {yg,yh} € E{H). Informally 
speaking, we view G and H as uniformly vertex-weighted graphs of total weight 1 while 
aij tells what fraction of vertex Xi is mapped into vertex yj. It is not hard to show (see 
Section E]) that this defines a pre-metric on the set of graphs, that is, 6i is symmetric, 
non-negative and satisfies the Triangle Inequality (but may assume value zero on distinct 
graphs: e.g. 6i{K^^rn, K„^n) = for any m,n> 0). 

Note that, for graphs Gi and G2 of the same order, we trivially have 6i{Gi,G2) > 
6i{Gi,G2)- This inequality is in general strict (see Arie Matsliah's example presented 
in the technical report jJOl Appendix B] or Example [13] here). However, we prove in 
Lemma [H] that 

'^i(Gi,G2)<3 5i(Gi,G2), (4) 

answering in the affirmative an open question posed by Goldreich, Krivelevich, Newman, 
and Rozenberg [201 Section 6] (see [21] for the journal version). 

Now, let us say that the extremal (/, 'P)-problem is stable if for every e > there are 
e' > and uq such that for every ni,n2 > uq and every two graphs Gi, G2 with Gi G P„- 
and f{Gi) > exf^rii, V) — e\ for i = 1,2, we necessarily have 5i{Gi, G2) < S- Theorem [T^ 
here gives an alternative characterization of stable extremal problems. However, we 
postpone the exact statement as well as the proof until Section O after we define graph 
limits in Section [2] and extend the distance to them in Section [3j 

For example, our approach applies to the Turdn problem that asks for the maximum 
size of an J-'-free graph of order n. This is a central question of extremal graph theory that 
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was introduced by Turan [39]. Its scaled version can be represented in our notation as 
exp(n, Forb(J-')), where p{G) = 2e{G) j {v{G)Y denotes the edge density of G and Forb(J-') 
consists of all J-'-free graphs. By applying our Theorem [THl we obtain a new proof of the 
following celebrated result in Section [61 

Theorem 1 (The Erdos— Simonovits Stability Theorem |14L 137] ) For every (pos- 
sibly infinite) family T oj non-empty graphs the extremal {p,FoTh{J^)) -problem is stable. 

It is well-known that eXp{n, Forb(J-')) = ^ + o(l), where 

r = mm{x{F) : F G -F} - 1 > 1, (5) 

and the lower bound is given by the Turan graph Tj.{n) G Forb(J-'), the complete r- 
partite graph on [n] with parts of size \n/r\ or [ra/r]. Thus, by (jlj), Theorem [1] can be 
reformulated in the more familiar form that for any £ > there are e' > and such 
that every J-'-free graph with n > uq vertices and at least (^-^ —^') (2) edges can be made 
isomorphic to Tr{n) by changing at most edges. 

Theorem [1] was first applied by Simonovits [37J to determine the exact value of the 
Turan function ex(n, F) for various forbidden graphs F . This theorem has a huge num- 
ber of applications. For example, Theorem [T] turned up quite a few times in the author's 
research alone: see the papers with Jiang [2l], Lazebnik and Woldar [25J, Loh and Su- 
dakov [26] , Mubayi [32] , Yilma [3l] . Another proof of Theorem [1] was recently discovered 
by Fliredi [18]. 

2 Graph Limits 

Here we present the main definitions of "dense" graph limits. This notion of convergence 
(also called the left convergence in [H Section 2.2]) will be of main interest for this paper. 
We refer the reader to e.g. [8] for further details. 

Until recently, the measure-theoretic methods were rare in discrete mathematics (if 
compared with, for example, linear algebra or topological tools). Bearing in mind a com- 
binatorialist reader who does not use real analysis in research, we decided to take an 
extra care with measure theoretic concepts and to give references or detailed explana- 
tions whenever feasible (even of some fairly standard results). For example, the result 
of Lemma [11] is stated in [30] Page 5] without proof; here we carefully fill in all miss- 
ing details. All analytical terms that we do not define can be found in the book by 
FoUand [15]. 
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Let M denote the set of reals and /CM denote the closed unit interval [0, 1]. For 
Y C M", let Cy = {AnY : A ^ C} denote the restriction of the cr-algebra £ of Lebesgue 
measurable subsets of to Y. If F C M" is Lebesgue measurable, then fiy denotes 
the restriction of the Lebesgue measure /i to £y. Let By = {A n Y : A E B} he the 
restriction of the a-algebra B of Borel subsets of R" to Y. When the set Y is clear from 
the context, we write £, /i, and B for Cy, fiy, and By respectively. We say that some 
property holds almost everywhere (abbreviated as a.e.) if the set of x for which it fails 
has Lebesgue measure 0. A measurable function is called simple if it assumes only finitely 
many values. 

A function : — )■ R is called symmetric if W{x,y) = W{y,x) for every x,y & I. 
Let W consist of all symmetric bounded measurable functions W : — )■ (R, 
Following [8], we call the elements of W graphons. Let W/ consist of those graphons 
W eW such that < W{x, y) < I for every x,y E I. 

A function : (J, C, ^) — )■ (/, C, /i) is called measure preserving if it is measurable 
and fi{(j)~^{A)) = fi{A) for any A G Cj. Let $ consist of all such functions. Note that 
G $ may be very far from being invertible as e.g. (f){x) = 2x — \2x\ shows. Let $o 
consist of bijections (p : I ^ I such that both and belong to $. Clearly, each of 
$ and $0 is closed under taking compositions of functions. For G $ and W G W, let 
W't' be defined by W'^'^x, y) = W{(j){x), 4>{y)). It is easy to see that G W and for any 
■0 G we have 

(ly?^)'/' = (6) 

A few remarks are in order. It is standard (see e.g [151 Page 44]) to consider the 
cr-algebra B of Borel sets whenever (a subset of) R" is the range of a function from some 
measure space. This has many advantages: we can add or multiply such functions [151 
Proposition 2.6], take pointwise limits [T^l Proposition 2.7], etc, with the resulting func- 
tion being measurable. In particular, by [151 Theorem 6.6], the vector space 

:= L\l\C,iJ,) = {integrable W : {P,C,^i) (R, i3) } / ~, (7) 

where we write U ^ W i^U = W a.e., is a Banach space with respect to the ii-norm 

\\Wh = [ \W{x,y)\dfi{x,y). (8) 

On the other hand, DiBenedetto [Tfll Section 14.1] demonstrates that the set of measur- 
able functions from (/, C) to (/, C) is not closed under taking pointwise limits (nor under 
multiplication, nor under addition, even if we take the interval [0, 2] as the new range, 
as some easy modifications of his example can show). Note that, by definition, the set $ 
consists of Lebesgue-to-Lebesgue measurable functions (so that, e.g., for every W G W/ 
and G we have W'*' G W/). 
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One can show that for any W G W there is t/ G W such that W = U a.e. and 
U is measurable as a function {I^,B) — )■ (M, (Indeed, by writing the values of W 
in base 2, represent W = Xliez 2* ^ linear combination of the indicator functions 

of Lebesgue sets Xi G £/2 and then replace each Xi by some Borel set Yi G i3/2 with 
fi{Xi A = 0.) This allows some flexibility in the definitions above. Still, in order to 
eliminate any ambiguity, we decided to specify the corresponding a-algebras whenever 
the measurability of functions may matter. 

Also, note that every graphon W G W, as a bounded measurable function on the 
finite measure space {P,C,fi), is integrable (see [151 Section 2.2]), that is, W G L^. 

Finally, let us remark that the standard definition of allows functions to assume 
values ±oo. (This is convenient in the statements of many theorems of real analysis.) 
Since any integrable function assumes value ±oo on a set of measure and we identify 
a.e. equal functions, we can restrict ourselves in ([7]) to functions with values in M only. 

For any integrable function W : (/^,£,/i) — )■ (M, i5) (in particular, for any graphon 
W), define its cut-norm (also called the box-norm, rectangle-norm, etc) by 



\W\\n = sup 



W{x,y) d^i{x,y) 



SxT 



(9) 



The cut-distance Sn{U,W) between U,W E W is the infimum of \\U — W^Wn over all 
G $0- See [SI Lemma 3.5] for other equivalent ways to define this distance. For any 
S', T G £/, G $0; and an integrable function W : {P, C, /i) — )■ (M, B), we have 

W{x, y) d/i(x, y)= f W\x, y) d/x(x, y), (10) 

SxT J (l,-^{S)x<j>-'^{T) 

which is easiest to see from the definition of the Lebesgue integral by approximating W 
by simple functions [151 Section 2.2]. It follows that \\U - IV^Wn = \\U^'' - W\\n and 
that 6n is a pre- metric on Wj (see the argument leading to ( [T5]) ). 

For a graphon G W we consider its equivalence class 

[W] = {UeW:6a{U,W) = 0}. 

Let 

X = {[W]:W eWi} (11) 

consist of those equivalence classes that have a representative in W/. We call elements of 
X graph limits. The pre-metric 6n induces a metric on X, which we still denote by the 
same symbol 6n- 

Usually, it is more convenient to operate with graphons, understanding equivalence 
classes implicitly. But here we try to be as explicit as it is reasonably possible. Since 
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the words "graph" and "hmit" are frequently used in this paper in various contexts, 
we will use (in the absence of a better name) the term graphit when referring to an 
equivalence class [W] with W G W/. (One might view terms "graphon" and "graphit" 
as abbreviations of "graph function" and "graph limit".) 

For a graph G on vertices {xi, . . . ,Xn}, the corresponding element of X is A{G) = 
[Wg], where Wq G W/ is defined by 

I 0, for all other {x,y) G / , 

that is, we encode the adjacency matrix of G by a function Wg G W/. Clearly, the 
graphit A{G) does not depend on the labeling of V{G) (while the graphon Wq does in 
general). 

We have completely defined the metric space {X,6n) and the special points A{G). 
This determines the promised convergence on graphs. Let us give some brief pointers to 
the main properties of this construction. 

Lovasz and Szegedy [29i Theorem 5.1] proved that the metric space {X, Sn) is compact. 
Also, they showed [281 Theorem 2.2] that the set { [Wg] : G is a graph } is dense in 
{X, 6n), that is, every graphit [W] with W G W/ is a limit of some sequence of graphs. 

Any graph sequence G„ with e(G„) = o(f (G„)^) as n — )■ oo, converges to the graphit 
[Const (0)], where for a G /, Const (a) G Wj is the constant function that assumes the 
value a. This is why the phrase "convergence of dense graphs" is often used. 

The graphon Wg can be viewed as a version of the adjacency matrix of a graph G. 
However, a better informal interpretation of a general graphon W G W/ is as a continuous 
version of the matrix that encodes densities between parts of a (weak) regularity partition, 
see [291 Section 5]. This also hints why, although we start with 0/1- valued functions Wg, 
we have to allow general real-valued functions when we pass to limits. Having this data 
for the graph, one can approximate, for example, the value of a max-cut: for graphons the 
corresponding computation is the supremum of the integral in (Q over disjoint measurable 
S,TCI. 

For graphs F and G the density t{F, G) of F in G is the probability that a random 
(not necessarily injective) map V{F) — )■ V{G) induces a homomorphism from F into G. 

As it turns out, the subgraph densities behave well with respect to the 5n-distance. 
In combinatorial terms, this says, roughly speaking, that if for two graphs G and H on 
[n] we have 

I e{G[A, B]) - e{H[A, B]) I = o{n^), for every A,BC [n], (13) 
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then for every fixed graph F we have G) — t(-F, H)\ = o(l). We refer the reader to 
|28[ Lemma 4.1] or [51 Theorems 2.3 and 3.7] for the precise statements and proofs. This 
may be viewed as a version of the Counting Lemma: if we know the pairwise densities in 
a regularity partition V{G) = Vi U • ■ ■ U \4 of a graph G, and generate the corresponding 
A;-partite random graph H on V{G), then as v{G) and k tend to infinity, with high 
probabihty (fT3!) holds, and we can approximate subgraph densities in G by those in H. 
This greatly motivates why the cut-norm is chosen to define the distance on graphons. 
The role of (p in the definition of 6n is, in the discrete language, to overlay fractionally 
the vertex sets of two graphs, cf ([3]) here and [8l Section 5.1]. 

It is natural to define the density of a graph F on [k] in a graphit [W] by picking an 
arbitrary graph sequence ((?„) convergent to [W] and letting 



This is well-defined and does not depend on the choice of In fact, by writing 

t{F, Gn) as a fc-fold sum and approximating it by a fc-fold integral, one can show (see 
Lemma 4.1] or |8i Theorem 3.7.a]) that 



Furthermore, neither of these definitions depends on the choice of G [W], so we can 
write t{F, W) in place of t{F, [W]). Also, we have t{F, G) = t{F, Wg). 

More generally, in terms of graphons, pS| Lemma 4.1] (see also |H1 Theorem 3.7.a]) 
implies that the induced function t{F, — ) : {X, 6n) — t- / is continuous for any F. Thus if 
(Wn)n£N is 5n-Cauchy, then the sequence it{F, Wn))nm of reals is Cauchy for every fixed 
graph F. The converse of this also holds, by a result of Borgs et al [H Theorem 3.7.b]. 
Thus for W,Wi,W2r--^ >V/, 

lim 6n{Wn, W)=0 if and only if V graph F lim t{F, Wn) = t{F, W). (16) 



It follows that each graphit [W] is uniquely determined by its "moments function" 
t{—,W). An algebraic characterization of all possible functions t{—,W) realizable by 
some W G W/ is given by Lovasz and Szegedy [SS'j Theorem 2.2]. 

Let us also say a few words about graph limits and property testing. (See Goldreich, 
Goldwasser, and Ron [19] for a precise definition of property testing and several funda- 
mental results.) In the most restrictive sense (the oblivious or order independent testing), 
we have a (very big) unknown graph G and are told the subgraph G[X] induced by a 
random m-set X of vertices, where m is a fixed number. It is known that with probability 
at least 1 — e we have SniWcix], Wq) < £, provided m > mQ{e) (see [2S1 Theorem 2.5] or 



t{F,[W])= lim t(F,G„). 



(14) 




(15) 
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[U Theorem 3.7]). This means that we can learn a good (5n -approximation to the graph 
G. The objective of property testing is to approximate with high probabihty how far 
G is from a given property but the edit distance 5i is to be used here. Graphons 
seem to provide very convenient tools and language for dealing with this problem (which 
essentially amounts to relating the 5i and 5u distances from an arbitrary graph to the 
given property), see [5| [30]. 



3 Extending the ^i-Distance to Graph Limits 

Here we show how to extend the distance 5i from graphs to graphits. This definition is 
standard but it seems that no formal proofs of some of its properties have appeared in 
the literature. Therefore we give careful proofs of all claims (or references to them). The 
author thanks Laszlo Lovasz for pointing out that Lemma [11] can be deduced from the 
results in [l],[3n], which is the proof presented here. 

Here is the definition of 5i for graphits. First, we define 5i on W, the set of graphons. 
For U,W eW, let 

(5i([/, H^) = inf { ||[/ - H^^lli : G $0} , (17) 
where \\U — W^\\i is the standard £i-norm oiU — as defined by ([H]). 

Clearly, 5i is non-negative. It is symmetric by fllOl) . Also, 5i satisfies the Triangle 
Inequality. Indeed, for every U,V,W G W and £ > we can choose 0, ^/^ G $0 such 
that \\U^ - '^lli < Si{U, V)+e and \\V -W^\\i< 6i{V, W) + e. Now, by the Triangle 
Inequality for the £i-norm, 

5iiU,W) < \\U - {W^f \\i = \\U't'~W^\\i 

< WU''' -V\\i + \\V -W^Wi < 6i{U,V) + 6i{V,W) + 2e. (18) 

Since e > was arbitrary, the claim follows. Hence, 61 is a pre- metric on W/. 

We will present an equivalent definition of 61 in Lemma [H] and will conclude in Corol- 
lary [12] that 61 gives a metric on X. Let us state a few auxiliary or related results 
first. 

Lemma 2 Let an mtegrahle W : (P^,!^) ^ (K, -B) satisfy \\W\\n = 0. Then W = 
a.e. In particular, for any U,W E W, \\U — W\\n = implies that \\U — W\\i = 0. 

Proof. Let Z be the Lebesgue set of the function W, which can be defined as the set of 
those {x, y) in the interior of P such that 

1 



lim 



^'{,Rx,y,i 



W{x',y')-W{x,y) d/i(x', y') = 0, (19) 
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where Rx,y,c is the open rectangle (x — c, x + c) x {y — c^y + c). 

The Lebesgue Differentiation Theorem ([15, Theorem 3.21]) imphes that ^i{Z) = 1. 
If W{x, y) 7^ for some (x, y) G Z, then by (fT9|) there is c > such that 



Thus \\W\\n>\Jji^^^W\> 2c^\W{x,y)\ > 0, a contradiction. Thus W = a.e. I 

A function U : P ^ W is called an interval step function if there is a partition 
/ = Ji U ■ ■ ■ U Jfc into finitely many intervals such that U is constant on each rectangle 
li X Ij. Any interval step function is a simple function. Of course, such U is necessarily 
measurable, even in the strongest sense as a function from (/^, B) to (M, 2^^). 

Lemma 3 For any e > and any integrable function W : {P,C,fi) — )■ (M, -B) there is 
an interval step function U such that \\W — U\\i < e. Moreover, ifWE Wi, then we can 
also require that U G Wj. 

Proof. The first part of the lemma follows from flEi Theorem 2.41] (see also [HI Lemma 3.2]) 
Let us establish the second part. Let W G W/ and Uo be the interval step func- 
tion with \\W — Uo\\i < e, given by the first part. Let Ui{x,y) = g{Uo{x,y)), where 
g{z) = max(0, min(l, 2;)) maps 2; G M to the nearest point from /. Since for every z' & I 
and 2; G M we have \g{z) — z'\ < \z — z'\, we conclude that \\Ui — W\\i < \\Uo — W\\i < e. 
Finally, we take U{x,y) = {Ui{x,y) + Ui{y,x))/2. Then the new interval step function 
U belongs to W/. Also, in view of inequality |a — c| + |6 — c| > 2 |^|^ — c| valid for any 
a, 6, c G M, we have — f/||i < — f/i||i < s, as desired. I 

Remark. This approximation reminds the one given by the Weak Regularity Lemma of 
Frieze and Kannan [17] (see also [291 Section 2]) with respect to the cut-norm, except we 
cannot bound the number of parts in Lemma [3] in terms of e only. This is an important 
distinction between the cut-norm and the £i-norm, giving another motivation for taking 
6n as the distance between graphons. This allows one to construct a finite £-net for the 
metric space {X,6n)- Namely, let n = n{e) be large and take all interval steps functions 
with steps [-, — ) that assume values in {-,..., -|: there are at most n"'^ < 00 such 
functions. Thus (Af , 6n) is totally bounded, which is one of the ingredients needed for 
compactness. See Theorem 5.1] for more details. 

Lemma 4 Let X,Y ^ Cj have measure 1 and let i/j be a bijection from X onto Y such 
that for any interval J I the sets ip{J fl X) and ip~^{J fl Y) are Lebesgue measurable 
with fJ^{ip{J n X)) = fi{ilj^^{J n Y)) = fi{J). Then there zs G $0 such that cf) = ip a.e. 
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Proof. Suppose first that |/\X| = = c, that is, the cardinality of both I \ X 

and / \ y is continuum. Let be an arbitrary bijection between I \ X and I \ Y while 
(f){x) = ip{x) if X & X. Then (p = ip a.e. Also, for any interval J ^ I, the pre- image 
(f)^^{J) differs from ip^^^J CiY) E C on a. set of measure 0, so it is Lebesgue measurable 
of measure /i(J). Since B is generated by intervals as a a-algebra ([Ul Theorem 1.6]), 
it follows (e.g. by application of the uniqueness claim of [151 Theorem 1.14]) that 
is a measure preserving function from (/, £) to {I,B). But a subset of / is Lebesgue 
measurable set if and only if it can be sandwiched between two Borel sets of the same 
measure ( |15[ Theorem 1.19]). This easily implies that is a measure preserving map 
from (/, £) to (/, £), that is, G $. Likewise, e $, giving G $o as required. 

Finally, suppose that, for example, |/ \ X| < c. Let C C / be the Cantor set, which 
has measure and cardinality continuum |T5l Proposition 1.22]. Let X' = X \ C and 
Y' = Y\ ^(X n C). Then ^ maps X' bijectively onto ¥'. Also, /i(^(X n C)) = 0. 
Indeed, for every e > 0, we can find a set J 3 C which is the union of finitely many 
intervals of total length at most e that covers C. By the assumption of the lemma, 
ipiX n J) has measure at most e. Since e > was arbitrary, ^{ip{X fl C)) = 0. Thus 
IJ,{X \ X') = fi{Y \ Y') = and the restriction iplx' satisfies the assumptions of the 
lemma. Since | / \ X'| = | / \ | = c, we already know how to find the required G $o 
for ip\x'- The very same function works for ip as well. | 

Let us call a point x lying inside a Lebesgue set A C M a density point of A if 

ljL(Ar\(x-c,x + c)) 
um = 1, 

c>0 

or equivalently, if x belongs to the Lebesgue set (as defined by the 1-dimensional version 
of f|T9l) ) of the characteristic function I^i : M — ?► {0, 1} of A. Again, Theorem 3.21 in [15] 
implies that almost every point of A is a density point. 

The arithmetic operations and the linear order on / = [0, 1] play no role in the 
definition of graphons; see [H Section 2.1] for a more general point of view. The following 
simple lemma suffices for our purposes. 

Lemma 5 For every partition of I = AiU- ■ - UAk into Lebesgue measurable sets Ai there 
are a partition / = Ji U • • ■ U Jfc into intervals and G $o such that /i(0'(y4j) A Jj) = 
for each i G [k]. 

Proof. It is enough to prove the case k = 2 with the general claim following by a simple 
induction on k. 

Let Oi = 0, 02 = fi{Ai), Ii = [0,02], and I2 = I \ h- Assume that < 02 < 1 (for 
otherwise any -0 G $0 works). 
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Let i = 1 or 2. Let Xi C Ai be the set of density points of Ai. For x G Xj let 

iIj{x) = ai + ^l{A^n[Q,x\). 

Then il){Xi) hes in the interior of /«. Indeed, if, for example, il){x) = a^, then fi{Ai fl 
{—CO, x)) = 0, so X cannot be a density point for Ai. Likewise, if ?/ G Xj \ {x} is another 
density point of Ai, then ■?/'(?/) 7^ Let Yi = ipi^Xi). The pre-image under ip of any 

open interval J = {ai, ai + b) (1 li is the intersection of the interval (0, c) with Xi, where 

c = sup{x G / : fi{Ai n [0, x]) < b} = sup{x G / : fl [0, x]) < b}. 

Since 6 < fi{Xi) and the measure /i is continuous from below ([T51 Theorem 1.8.c]), we 
conclude that fi{ilj~^{J)) = b = fi{J). Also, for any open interval J = {b,c) C /, the 
image under ip of Xi n J is Yi n Ji, where 

Ji = {ai + fx{Ai n [0, b]), ai + fi{Ai n [0, c])) 

is a subinterval of /j with /i( Jj) = /i( J fl Xj). 

Let X = Xi U X2 and Y = Yi U Y2. It routinely follows that all assumptions of 
Lemma H] with respect to the bijection ip : X Y are satisfied. The element G $0 
returned by Lemma H] has the required properties. I 

Lemma 6 For every interval step function f/ G W and G there is E such that 
(W^)^ = U a.e. 

Proof. Let J = Ji U • ■ ■ U be a partition into intervals such that U is constant on 
each rectangle Jj x Ij. For i,j G [k], let = fi{Ai j), where Aij = Ij fl (f)^^{Ii). 
Since is measure preserving, Yl'j=i — A^(-^i) every i G [A;]. Partition the interval 
Jj = Jj^i U • ■ ■ U Jj^fc into intervals of lengths respectively ai^i, . . . , ai^k- By Lemma [5] find 
G $0 such that fi{ri{Aij) A Jjj) = 0. The element ip = rj~^ G $0 has the required 
properties by ([6]) because for a.e. x G we have %Ij{x) G and G /j. I 

Lemmas [3] and [6] easily imply the following result. 

Corollary 7 For any U,W eW and (p e^, we have Si{U, W) = 6i{U'^, W). I 

Theorem 8 For U,W E W, the following are equivalent, 
(a) For every graph F, we have t{F, U) = t{F, W). 
(h) 5n{U,W) = ^. 

(c) There are (p,ip E ^ such that = a. e. 
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Proof. The equivalence of (a) and (b) follows from (fT6|) (i.e. from [281 Lemma 4.1] and [H 
Theorem 3.7]). The equivalence of (a) and (c) is proved by Borgs, Chayes, and Lovasz [H 
Corollary 2.2]. | 

Lemma 9 For any U,W ^ Wi, we have 

Si{U,W) = inf ||f/^-iy^||i. (20) 

Proof. Since $o is a subset of $ and $o contains the identity function Id : / /, the 
">" -inequality in f l20|) easily follows. Let us show the converse. 

Let U,W & yV and e > 0. By Lemma [3] we can find interval step functions Uq and 
Wo lying within e from respectively U and W in the £i-norm. For any 0, G $, we have 

by m 

\\U^ - W^Wi > \\U^ - Wo^lli - \\U* - U^Wi - \\W^ - W^Wi > \\U^ - W(f 111 - 2e. 

Likewise, \\U — W^\\i < \\Uo — Wq\\i + 2e. Since e > was arbitrary, it is enough to 
prove (120|) on the additional assumption that U and W are interval step functions. 

Again, let e > 0. Let 0, G $ be such that WW^ — W^\\i — e is at most the right-hand 
side of (EO]). By Lemma [6] choose r/ G $o such that (W^)'' = W a.e. Then, by (El), 

l|f/<^ - w'^Wi = Wiw^y - {w^Y'Wi = iit/^"^"") - will. (21) 

Again, by Lemma |6l applied to U and ^or^ G find z/ G $o such that (f/('^°^))^ = U a.e. 
From (pB we conclude that ||f/''^ — W^^Hi = ||f/ — W^||i, which is at least the right-hand 
side of f lTTl) . Since e was arbitrary, the lemma follows. I 

Lemma 10 For any two graphs G and H, the 6i-distance 6i{G,H) defined by ^ is 
equal to Si(Wg, Wh), where Wg and Wh are defined by ( fl^) . 

Proof Let V{G) = {x,, ■ ■ ■ ,Xm} ^nd V{H) = {y,, . . . ,yr,}. For G $0, ||W^ - W^,||i 
equals to the expression in ([3]) with aij = iJ,{Iir\(j)~^{Jj)), h = (^, ^) and Jj = (^, ^). 
Conversely, given numbers aij such the matrix (ctijOijeM has row sums 1/m and column 
sums 1/n, one can easily construct G $0 giving these aij as above. I 

Lemma 11 LetU,W satisfy 6n{U, W) = 0. Then 6i{U, W)=0. 

Proof. By Theorem [HI there are 0, '0 G $ such that If^ = a.e. The claim follows by 
using the equivalent definition of 61 from Lemma [9l I 
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Corollary 12 The function 6i induces a metric on the set X of graphits, extending the 
Si-distance from graphs. I 

Remark. Let us point out that the convergence with respect to the cut-distance does not 
generally imply the convergence with respect to 6i. For example, the infinite sequence of 
random graphs Gn G Gn,i/2 converges in the 5n-distance with probability 1 to the graphit 
[Const(l/2)] by [281 Corollary 2.6] while no graph sequence whatsoever can converge in 
the (5i-distance to [Const(l/2)] by Theorem [T7I here. 



4 Comparing the Discrete and Fractional (5i-Distances 

Clearly, for graphs G and H of the same order we have 5i{G,H) > 5i{G,H), where 5i 
is defined by ([2]). The distances 6i and 6i do not coincide in general as Example [T3l 
demonstrates. Independently, Arie Matsliah (see [20l Appendix B]) presented another 
construction that achieves ratio 6/5. Although our ratio is smaller (only 11/10), the ideas 
behind our construction are different from those of Matsliah and might be useful in the 
quest for better ratios. Hence, we decided to keep this example in the paper. 

Example 13 There are graphs G and H such that v{G) = v{H) hut 

5i(G,if)>^5i(G,if)>0. 

Proof. Fix an integer n > 24. Pick disjoint sets X = {xi, . . . , X4}, M = Mi U ■ ■ • U M4, 
and = A^^i U ■ ■ ■ U A^5 with each Mj having 4 elements and each Aj having n elements. 

Let V{G) = V{H) = N U M U X. It will be the case that NU M spans the same 
subgraph in both G and H. Namely, A^ spans the complete graph while, for z G [4], we 
put the complete bipartite graph between Mj and U^-^j^Aj-. These are all edges inside 
MUN. 

Fix another partition M = Li U • • ■ U L4 such that each has 4 elements and 
\Li n Mj| = l-Lj+i n Mil = 2 for 2 G [4], where we agree that L5 = Li. 

In G, the edges incident to X are as follows: {xj, Xj} for 1 < i < j < 4 with j — i even 
plus all pairs {xi, y} for i G [4] and y G Mj. In H, the edges incident to X are as follows: 
{xi, Xj} for 1 < i < j < 4 with j — i odd plus all pairs {xj, y} for i G [4] and y G Lj. 

We have 

\E{G) A EiH)\ = J2\M,A U\ + (^'^'^ = 22. (22) 
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Let us show that this is smallest possible. Pick an optimal bijection a : V{G) V{H). 
In each of G and H, every vertex in has degree at least 5n — 1 while any vertex in 
Mux has degree at most 4n + 1. Hence, if a does not preserve N, then the number 
of discrepancies will be at least {5n — 1) — {An + 1) > 22. So, assume that cr{N) = N. 
Likewise, we have (j{Mi) = Mj, for otherwise the number of discrepancies (between M 
and A^) is at least n > 22. Finally, consider the action of a on X. For every x,y G X, 
their neighborhoods in M with respect to G and H differ by at least 4. If a does not map 
some Xi into {xi,Xi+i}, where x^ = xi, then the neighborhoods Ncixi) and NH{cr{xi)) in 
M are disjoint and this vertex alone creates at least 8 discrepancies. Moreover, since X 
spans 2 and 4 edges in G and H respectively, the total number of discrepancies is at least 
8 + 3x4 + 2 = 22 and we cannot improve (122|) . Thus let us assume that cr(xj) G {xi, Xj+i} 
for every i G [4]. This implies that either a is constant on X or shifts indices by 1. In 
either case, this gives the same bound as in f l22|) . 

Hence, 6i{G,H) > (^^^^20)^ ■ Let us establish an upper bound on 6i{G,H) now. 

Let G[2] be the 2-fold blow-up of G, where each vertex x is replaced by two ver- 
tices x',x" and each edge {x,y} by the complete bipartite graph with parts {x',x"} and 
{y',y"}. For Y C V{G), let Y[2] = {y',y" : y G ¥}. Consider the following bijection a 
between the vertex sets of G[2] and H[2]. It is the identity bijection on M[2] UA^[2]. For 
i E [4], let <j{x'^ = x[ and <j{x'-) = x'Ij^^. Easy checking shows that cr, when restricted to 
X[2], mismatches only 16 adjacencies (versus 4 x (2) = 24 if a were the identity). The 
number of discrepancies between X[2] and M[2] is 4 x 16. We have 

2 ID - 

5,{G,H) < 5,{G[2],H[2]) < ^^^^^^^(4 x 16 + 16) < -5,iG,H). | 

Lemma 14 For any two graphs G and H on the same vertex set [n], we have 

6i{G,H) < 36i{G,H). 

Proof. IfG^H, then 6i{G, H) = 6i{G, H) = 0, so assume G ^ H . hei I = nHi{G, H)/2 
be the smallest number of adjacencies we have to change in G to make it isomorphic to H. 

Let A = («jj)jjG[n] be an optimal overlay matrix as in ([3]), where we assume Xi = i 
and yj = j. (Thus uA is doubly-stochastic.) 

Although uA can be represented as a convex combination of permutation matrices by 
Birkhoff's theorem [3j, we find it more convenient to work with an approximation where 
all coefficients are equal. (Thus some permutation matrices may be repeated more than 
once.) Such an approximation is easy to find as follows. 

Pick a large m > mo(A). Inductively on i, we construct permutation matrices 
as follows. Suppose that i > and we have already found Pi, . . . ,Pj such that P' = 
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Pi + ■ ■ ■ + Pi < mnA (where matrix inequalities are meant component-wise) . If there is 
a permutation matrix Pj+i such that P' + Pj+i < mnA, take it and repeat the step. 

Suppose that no such Pj+i exists. Let B = if3f,g)f,ge[n] = mnA — P' . This is a non- 
negative matrix with row/column sums m — i. By Hall's Marriage theorem [22], there is 
a set P C [n] of r rows and a set 5 C [n] of n — r + 1 columns such that each entry of 
the R X S'-submatrix of B is less than 1. Hence, 



feRg=i feRgeS feRge[n]\s 

< r{n + 1 — r) + (m — i){n — (n — r + 1)), 

and therefore m—i < r(n+l— r) < (ri + l)^/4. Let Pj+i, . . . , Pm be arbitrary permutation 
matrices and P = ^(Pi + ■ ■ ■ + Pm)- It follows that 

\\A - P oo < 2 X ^ ^ = ^ ^ . 

4m?T, 2mn 

Since m is arbitrarily large, in order to prove the lemma it is enough to show that 

6i{G,H)<36i{G,H,P), (23) 

where Si{G,H,P) is defined by 

Let (Ti, . . . , (Jm : [n] — ?■ [n] be the permutations encoded by Pi, ... , Pm respectively. 
As it was defined after (|3]), A is the set of all quadruples {x,y,x',y') G [n]* such that 
exactly one of the relations {x, y} G E{G) and {x', y'} G E{H) holds. Note that we allow 
X = y or x' = y' but both equalities cannot hold simultaneously by the definition of A. 

For G [m]'^, let consist of {x,y) G [n]"^ such that {x,y,ai{x),aj{y)) G A. 

For {x,y,x',y') G A, let I{x,y,x',y') consist of all pairs G [m]^ such that crj(x) = x' 
and cTjiy) = y'. Also, for X C [m], define 

'^x= 5^ |A(z,j)|. 

We have 

6i{G,H,P) = Yl P-yPy,y' 

(x,j/,x',y')gA 



m?n 



j2 ^] j: ± 

^-^ \ ^ — ^ mn I \ ^-^ mr 

x',y')eA yi:cT^(x)=x' j \j.(Tj(y)=y' 

{x,y,x',y')£A 

— y |A(z,j)| = ^'^H + E^i|A(^,^) 



(24) 



17 



Let us show that for any 1 < g < i < j < m we have 

|A(^7,^)| + |A(j, z)| + \A{j,g)\ > \A{g,g)\. (25) 

Start with any {x,y) G A{g,g). Let us transform {x,y) into {(rg{x),(Tg{y)) in three steps, 
where we consecutively apply {ag,ai), {aj-^ , cr'-^) , and {aj,ag): 

{x,y) {ag{x),ai{y)) {aT^{ag{x)),y) {ag{x), ag{y)). 

Since {x,y,ag{x),(7g{y)) G A, at least one of these three steps changes adjacency. De- 
pending on the number of the step when this happens, we get respectively that {x, y) G 
A{g,i), {a~^{ag{x)),y) G A{j,i), or {aj^{ag{x)),y) G A{j,g). Conversely, suppose 
that we are given the resulting conclusion of the form {u,v) G A (a, 6) with distinct 
a,b & {i,j,g}- The pair (a, 6) determines the number k G {1,2,3} of the step. This k, 
when combined with {u,v), easily allows us to reconstruct the ordered pair {x,y). Thus 
no element in the left-hand side of (1251) is doubly counted. This proves (1251) . 

By ([25]) (and I A (a, 6) I = | A(6, a)|) we conclude that S'lg^jj} > \A{g,g)\ > 2i. A simple 
averaging over all choices of {i, g, h} G {^ f) implies that Sim] > '^K'^) / H) = im{m-l)/3. 
By we have 

5AG,H,P) > 2M^-l)/3 + 2£m ^2^^ K{G,H) ^ 
finishing the proof of Lemma [TH | 

Remark. The author thanks Alexander Razborov for the remarks that simplified the 
original proof of Lemma [1] 



The interesting problem of finding the best possible constant in Lemma [T3] remains 
open. At the moment, we know only that it is between 6/5 (see [201 Appendix B]) and 3. 

The situation for the cut-distance is somewhat similar: the discrete version 5n of 5n, 
as defined by [H Equation (2.6)], is not always equal to the 5n-distance ([8, Section 5.1]) 
while for any two graphs G and H of the same order we have 

5u{G,H) < 6uiG,H) < 32((5n(G,/7))i/^' 

([SI Theorem 2.3]). It is open whether 6n{G, H) can be bounded from above by a linear 
function of 5n{G,H), see e.g [H Page 1830]. 



5 Characterization of Stability 

Recall that in the Introduction we defined when an extremal (/, P)-problem is stable. 
Here we give an alternative characterization. Since stability deals with relating the and 
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Sn distances, it is not surprising that the methods developed by Lovasz and Szegedy [30] 
in the context of property testing apply here. 

Theorem 15 Let V be an arbitrary graph property with "Pn 7^ for infinitely many n 
and let f be a graph parameter. Then the extremal {f,V) -problem is stable if and only 
if LlM{f,V) consists of a single graphit [W], where moreover W G W/ can be chosen to 
assume values and 1 only. 

The rest of this section is dedicated to proving Theorem [TBI the course of which 
we observe an interesting dichotomy result (Theorem [T71) . 

We will need the following result, which is a special case of [30l Lemma 2.2]. 

Lemma 16 Let W,Wi,W2, - ■ ■ e W be such that ||W„, - W\\n as n oo. Let 
S G Cp. Then Wn d/i — )■ J^W dfi as n ^ oo. 

Sketch of Proof. If S* is a rectangle, then the conclusion follows from the definition of the 
cut-norm. A general S G Cj2 can be approximated within any e > by a finite union of 
disjoint rectangles, cf Lemma [31 I 

Theorem 17 Let W G W/ and let Wi, W2, ■ ■ ■ G W/ be an arbitrary sequence such that 
SniyVn, W) as n ^ 00. 

If IJ,{W~^{{0, 1})) = 1 (that is, W assumes only values and 1 a.e.), then the sequence 
iWn)nm is necessarily convergent to W in the 6i-distance. 

// /x(VF~^({0, 1})) < 1 and each Wn is a.e. {0,l}-valued, then the sequence {Wn)n£N 
does not contain any Cauchy subsequence with respect to the 6i-distance. 

Proof. Suppose first that W is {0, l}-valued a.e. Let S = W^^{0) G £72. For each n eN 
choose 0„ G $0 such that \\W^" -W\\n < SniWn, W) + 1/n. Clearly, \\Wi" - W\\n tends 
to 0, so by Lemma [16] we have 

Si{Wn,W) < \\W^"-W\U= [w^"dfi+ [ (l-ty^)d/i 

Js Ji^\s 

[wdn+[ {1-W)dfi = 0. 
Js Jp\s 

Now, suppose that fi(W~^ {{0,1})) < 1 and that the second part of the theorem is 
false. By choosing a subsequence and relabeling, we can assume that (PVn)ngN itself is 
a Cauchy sequence with Si(Wm,Wn) < 1/2™ for every m < n. Let 0i :/—)■/ be the 
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identity map and Ui = Wi. Inductively on n = 2, 3, . . . , do the following. By induction, 
we assume that we have Un-i = W^"l^ with (pn-i £ ^o- By Corollary [71 

Thus there is 0„, G $o such that, letting f/„ = W^", we have 

\\Un-i-Unh<^. (26) 

The sequence {Un)nm is Cauchy with respect to the £i-norm: for m < n we have 

n " 1 1 

i=m+l i=m+l 

Since the normed space defined by ([7]) is complete Theorem 6.6]), the sequence 
{Un)nm has a limit U E L^: 

lim ||f/„-f/||i = 0. (27) 
We have Jj2 \U{x,y) — U{y,x) \ dfi{x,y) = because it is at most 

2\\U-Un\\i+ [ \Un{x,y)-Uniy,x)\dfi{x,y) = 2\\U-Un\\i^0. 

Jl2 

Thus U is symmetric a.e. on P by e.g. [IHl Proposition 2.16]. Likewise, <U{x,y) <1 
a.e. By changing ?7 on a subset of P of measure zero, we can assume that U G Wj. By 
the Triangle Inequality, 

6uiU, W) < 5u{U, Un) + 5u{Un, W) < 5i{U, Un) + 5u{Wt\W). 

This tends to as n — t- oo. Thus 5n{U,W) = and by Theorem El = W'l' a.e. for 
some t/', G $. Thus U is not {0, l}-valued a.e. 

For m G N, let 

Am = {{x,y) G P : 1/m <U{x,y) <1- 1/m}. 

Each Am is Lebesgue measurable since U is measurable. Also, Z = UmgN^m = {-2 G : 
?7(2;) ^ {0, 1}} has positive measure c. By the continuity from below [TSl Theorem 1.8.c] 
of the measure fi, there is m G N with fi{Am) > c/2. Since each f/„ = ly^" is {0, 1}- 
valued by assumption, we have ||f/„ — U\\i > c/2m. This contradicts (!27j) . and finishes 
the proof of the lemma. I 

Remark. The first part of Theorem [T^ can also be deduced from j5{H Lemma 2.9]. 
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Corollary 18 Let a sequence of graphs Gi,G2,--- converge in the 6n-distance to a 
graphit [W]. Then the sequence {Gn)neN converges to [W] in the 6i-distance if and only 
if W is {0,l}-valued a.e. I 

Proof of TheoremUR' Suppose first that the extremal (/, P)-problem is stable, as defined 
in Section dJ Let [W^] G L1M(/, P). Choose witnesses of this, that is, sequences 
of almost extremal graphs (GmJieN and {Hnji^m with Grm — ?• U and W in the 

cut-distance as i — t- oo. By stability, 6i{Gmi, HnJ — ?■ 0. Hence, 

Su{U, W) < 6u{U, GmJ + 5n(G„,, HnJ + 6u{Hn,, W) < if„J + o(l) = o(l). 

Thus 5n{U,W) = 0. Since [U], [W] E L1M(/,P) were arbitrary, the limit set LIM(/,P) 
consists of a single graphit [W]. Since (GmJ is Cauchy with respect to the 5i-distance, 
we conclude by Theorem [T71 that W is {0, 1}- valued a.e., proving one direction of the 
theorem. 

Conversely, suppose that LIM(/,P) = { [1^] } for a {0, l}-valued W G W/. Suppose 
on the contrary that the extremal problem is not stable. This implies that there is some 
e > such that for every i E N there are mi.rii > i, Gm, E Vrm, E P„. such that 
f{GmJ > exf{mi,V) - 1/i, /(i^„J > exf{ni,V) - and 

6,{Gm,,K^J>e. (28) 

By choosing a subsequence and relabeling, we can additionally assume that for every 
i < j we have rrii < rii < rrij < nj. 

By the compactness of {X,6n) we can find a sequence ii < 12 < ... such that 
(Gm,j.)feGN is convergent in the 5n-distance. Since (Gmi^)fcGN is a sequence of almost 
optimal graphs with increasing orders, its limit is necessarily \W], the unique element 
of LIM(/, P). Likewise, we can find a subsequence ji < 32 < ■ ■ ■ of {ik)km such that 
the graph sequence {Hnj^)jm converges to \W] in 5n- Clearly, the intertwined sequence 
{Gmj^ , Hn^^ , Gmj^ , -f^njj 5 ■ ■ ■ ) Still couvergcs to \W] . By Corollary [181 the last sequence is 
Cauchy with respect to the 5i-distance. This contradicts fl25]) and finishes the proof of 
Theorem [T5l I 

6 The Erdos— Simonovits Stability Theorem 

In this section, we will prove Theorem [TJ For this purpose, we adopt the nice proof of 
Erdos [13] that every i^^^+i-free graph G is dominated by some r-partite graph H, that 
is, V{H) = V{G) and dnix) > dc{x) for every x E V{G), where e.g. dnix) denotes the 
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degree of x in H. In order to prove this, Erdos [13] uses induction on r as follows. The 
case r = 1 is trivially true. Let x be a vertex of maximum degree in G and V be the set 
of neighbors of x. Then, G[V'] is i^^-free, so by the induction assumption we can find an 
(r — l)-partite graph H' that dominates (^[y]. Let H be the r-partite graph obtained 
from H' by adding a new part on V{G) \ V. It is not hard to check that H is the required 
graph, see [I3] for details. 

Unfortunately, our proof of the graphon version of this degree-domination result (The- 
orem 20 in the previous version [33] of this manuscript) is quite long and complicated. 
Later, during a discussion with Peter Keevash, it was realized that if one is content to 
prove just Theorem [H then the arguments dealing with graphons can be shortened. Here 
we present the shorter proof, referring the interested reader to [33J for the more general 
result. 

Since we are going to apply the Fubini Theorem a few times, we state it here. For 
a function : — > R and x G /, let the section functions Wx,W^ : / — t- R be 
defined by W^iy) = W{x,y) and W^y) = W{y,x). Let W^{x) = JjWx{y) dfi{y) and 
W*{x) = JjW^{y) dfi{y) (and let it be arbitrary if the integral is undefined). Clearly, 
for a symmetric W, we have Wx = and VT* = W* . Since {P, Cp, fij2) is not the 
product (/, £/, /i/) X (J, Cj, /i/) but its completion, we have to use the Fubini Theorem for 
Complete Measures ([151 Theorem 2.39]) which easily follows from the standard Fubini 
Theorem ([151 Theorem 2. 37. a]), with the derivation being described in [T5t Exercise 
2.49]. 

Theorem 19 (The Fubini Theorem for the Lebesgue Measure) IfW E L^{P,Cj2, 
then Wx,^ e L^{I,Ci,fii) for a.e. x E I. Furthermore, W^,W* E L^{I,Ci,fii) and 

W{x,y) d/i(x,y) = jw^{x) d/i(x) = j W*{x) d/i(x). I 

Let W E W/ and F be a graph on [n]. We call W F-free if for every (not neces- 
sarily distinct) E I there is a pair {i,i} E E{F) such that W{xi,Xj) = 0. 
Equivalently, W is F-free if and only if W{x,x) = for every x E I and there is no 
homomorphism from F to the infinite (uncountable) graph with vertex set / in which 
x,y are connected if W{x,y) > 0. 

If W E W/ is F-free, then t{F, W) = 0. The converse is not true: for example, fix 
distinct xi, . . . ,Xn E I and let W{x, y) = except W{xi, Xj) = 1 for all distinct i,j E [n]. 
However, please note the following Lemma [201 which is a rewording of a special case of 
a result of Elek and Szegedy [12], Lemma 3.4]. 
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Lemma 20 (The Infinite Removal Lemma) For every W G W/ there is U E W/ 
such that W = U a.e. and for every graph F either t{F, U) > or U is F-free. 

Sketch of Proof. Let Z be the Lebesgue set of W, as defined by f|T9|) . Clearly, Z C P 
is symmetric. Let U{x,y) = W{x,y) if {x,y) G Z and U{x,y) = otherwise. Since 
= 1, U = W a.e. Also, if Xi, . . . , x„ give an F-subgraph in U, then there is c > 
such that for any {i,j} G E{F), the measure of 

{(x, y) G (xj + c, Xj — c) X (xj — c, Xj + c) : Ty(x, y) > W{xi, Xj)/2 > 0}. 

is, for example, at least (1 — ■ 4c^. It follows that t{F, W) > 0. I 

Remark. Note that W = U a.e. implies that t{F, U) = t{F, W) for every graph F. 

For the rest of the section, fix an arbitrary family J-" of graphs. Recall that p{G) = 
2e{G) / {v{G)Y denotes the edge density and Forb(J^) consists of all J-'-free graphs. For 
a graphit \W], define p([Vr]) = t(-ft"2, [W^])- For convenience, we just write p{W). This 
is compatible with the previous definition in the sense that for every graph G we have 
p{G) = p(Wg)- Define r by ([5]) and assume that r > 1. 

Let A consist of those graphits [W] that maximize p{W) given that t{F, W) = for 
every F E J^. By the compactness of {X, 6n) and the continuity of each function t{F, — ), 
the maximum is attainable. Denote this maximum value by a. 

Lemma 21 LIM(p, Forb(J')) = A. 

Proof. Let [W] G LIM(p, Forb(J-')). Pick a sequence of almost extremal graphs (GnJ 
convergent to [W]. Since each is J-'-free, we have t{F, W) = for each F G J-" by the 
first definition dUD of t{F, W). We conclude that p{W) < a. 

Thus, in order to prove the lemma, it is enough to construct for every [W] G ^ a 
sequence of J-'-free graphs (G'„)„eN convergent to [W] with p{Gn) > a — o(l). Let U G [W] 
be obtained from W by applying Lemma [20l For each integer n we generate a random 
graph On on [n] as follows. Pick uniformly at random n elements Xi, . . . ,x„ G / and 
let a pair {i,j} be an edge of Gn with probability U{xi,Xj), with all n + Q) random 
choices being mutually independent. With probability 1 we have that the sequence 
converges to [U] = [W] (see |8l Theorem 4.5] or [28l Corollary 2.6]). Thus at least one 
such sequence (Gn) exists. In particular, we have lim„_j,oo piGn) = p{U) = a. Also, since 
U does not contain any copy of F G J-", each Gn is (surely) J-'-free, as desired. I 

Remark. The above proof, which is applicable to many other extremal problems, gives 
another justification why it is better not to restrict ourselves to extremal graphs when 
defining the limit set LIM(/, P). 
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Lemma [20] implies that 

A = {[W] -.^ F e 7 F (IW axid p{W) is maximum }. (29) 

Since is the analytic analog of the degree sequence, the following lemma can 
be informally rephrased that extremal graphons are degree-regular. The combinatorial 
interpretation of the proof is that if we have too much discrepancy between degrees in an 
almost extremal graph G, then by deleting ev{G) vertices of smaller degree and cloning 
ev{G) vertices of larger degree, we would substantially increase the size of G, which would 
be a contradiction (provided we do not create any forbidden subgraph). 

Lemma 22 For every \W] & A we have PF*(x) = a for a.e. x G /. 

Proof. The Fubini Theorem implies that iiW = U a.e., then W^, = f/* a.e. (Indeed, if 
e.g. > f/* on a set X C J of positive measure, then fx^ii^ ~U) = /^(VF* — U^) > 0, 
a contradiction.) Hence we can assume by Lemma [20] that W is J-'-free. 

Suppose on the contrary that the lemma is false. Let X„ = {x E I : W^,{x) < a — l/n} 
and Yn = {y e I : W:,{y) > a + l/n}. Note that e.g. Linm^n = {x e I : W:,{x) < a}. 
Since U„gN(X„UF„) has positive measure, there is some n with yu(X„UF„) > 0. Assume, 
for example, that /i(K„) is positive. By the Fubini Theorem (and fj2 = a), we conclude 
that /i(UmgN-^m) > 0. By increasing n, assume that c = min(/i(X„), /i(y„)) is positive. 

Let e = min(c, l/(3n)). By Lemma [3, we can find G $o such that /i(0([O, e]) \X„) = 
and /i(0([£, 2^]) \ r„) = 0. Let U = W^. Then U G [W] is still J^-free while U^{x) is at 
most a — l/n (resp. at least a + l/n) for a.e. x in the interval [0,5] (resp. [£:,2£:]). 

For X E I, let ipi^x) = x if x > e and ip^x) = x + e if x < e. Let V = U"^ E W/. 
(Although ip is not measure preserving, this definition makes perfect sense.) Note that 
V is J^-free: if Xi, . . . ,Xm G / induce a copy of F in V, then i^^Xi), . . . , ipi^Xm) induce a 
copy of F in U. Moreover, 

p{V)= [v,> [u,- [ U,+ [ U,-{2ef>a-eia-l/n) + e{a + l/n)-Ae^>a, 

J I J I J[0,e] J[s,2e] 

This contradicts the maximality of a. I 

For disjoint measurable sets Ai, . . . , A.^ C /, the complete r -partite graphon -ft'Ai,...,Ar 
is the simple function from P to {0, 1} that assumes value 1 on Ujg^r] ^je[r]\{i} x 
and on the remaining part of P. (In other words, W{x,y) = 1 if x,y come from two 
different sets Ai and otherwise.) Clearly, KAi,...,Ar is Kr+i-free. 

Next, we prove that the graphon problem has the unique solution when we forbid the 
clique Kr+i only. 
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Lemma 23 If T = {Kr+i} o-nd \W] G A, then there is a partition I = Ai U ■ ■ ■ U Ar 
into sets of measure 1/r such that W = -^'^1,...,^^ o,.e. 

Proof. We use induction on r with the case r = 1 being trivially true. 

Let r > 2. The J-'-free graphon Wk^. demonstrates that a > (r — l)/r. Let [W] G A. 
Assume that W is Kr+i-free by fl2^ (that is, by Lemma [2U]) . Pick u E I such that 
iy*('u) = a which exists by Lemma [221 Let -B = {w G / : > 0} and Ai = I\B. 

Let 6 = /i(-B). Since 14^ < 1, we have b > a. We are free to replace W by VT"^ with 
any (p G $o; thus we can assume by Lemma [5] that fi{B A [0,6]) = 0. The graphon 
U{x, y) = W{bx, by) is Kr-free: if xi, . . . ,Xr E I induce Kr in U, then bxi, . . . , bxr, u E I 
induce K^+i in W, a contradiction. Note that, by the Fubini Theorem, 

a = p{W) = [ W + 2 [ W,-[ W = b'^p{U) + 2(1 -b)a- [ W. (30) 

JB^ Jai JAl Ja( 

The inductive assumption implies that p{U) < (r — 2)/(r — 1). Thus 

<a< b^ + 2(1 - b)a < b^ + 2(1 - b)b. 

r r — 1 r — 1 

Routine algebra implies that a = b = {r — l)/r and all inequalities are in fact equalities. 

Thus W{x,y) = for a.e. {x,y) G Af. Since W^{x) = a = 1 — p{Ai) for almost every 

X G Ai, we have by the Fubini Theorem that W{x,y) = 1 for a.e. {x,y) E Ai x B. 

Furthermore, by the uniqueness part of the induction assumption, U = KB2,...,Br a.e. 

for some equitable partiton I = B2 U ■ ■ ■ U Br- Letting A^ = {bx : x G Bi}, we get 

required. I 

Proof of TheoremUl' By Theorem [T5l it suffices to show that any [W] G LIM(p, Forb(J-')) 
we have 5n(Vr, Kr) = 0. By Lemma [2T] and fl29|) . we can assume that W is J-'-free. 

Let us show that W is i^'r+i-free. Suppose on the contrary that xi, . . . ,Xr+i G / 
induce K^+i in W. Select F G J-" of chromatic number x{^) = r + 1 and fix a proper 
coloring c : V{F) — )■ [r + 1]. Then the map / : V{F) — )■ / with f{u) = Xc{u) shows that 
F C W, a contradiction. 

Turan graphs Tr{n) (or the graphon Wk^ and Lemma [2T]) show that p{W) > (r — l)/r. 
By Lemma [23] we have that p{W) < p{Kr) < (r — l)/r. Thus [W] is extremal for the 
(p, Forb({-ft'r+i}))-pi'oblem and (again by Lemma [231) is equal to [W^x^]? as required. I 
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